PARSING TURKISH USING THE LEXICAL FUNCTIONAL GRAMMAR FORMALISM 1 

Zelal Giingordii Kenial Oflazcr 

Centre for Cognitive Science Department of Computer Engineering 

University of Edinburgh Bilkent University 

Edinburgh, Scotland, U.K. Bilkent, Ankara, Turkey 

gungordu@cogsci.ed.ac.uk ko@cs.bilkent.edu.tr 


Abstract This paper describes our work on parsing Turk¬ 
ish using the lexical-functional grammar formalism. This 
work represents the first effort for parsing Turkish. Our 
implementation is based on Toinita’s parser developed at 
Carnegie-Mellon University Center for Machine Transla¬ 
tion. The grammar covers a substantial subset of Turkish 
including simple and complex sentences, and deals with a 
reasonable amount of word order freeness. The complex 
agglutinative morphology of Turkish lexical structures is 
handled using a separate two-level morphological analyzer. 
After a discussion of key relevant issues regarding Turkish 
grammar, we discuss aspects of our system and present re¬ 
sults from our implementation. Our initial results suggest 
that our system can parse about 82% of the sentences directly 
and almost all the remaining with very minor pre-editing. 

1 INTRODUCTION 

As part of our ongoing work on the development of compu¬ 
tational resources for natural language processing in Turk¬ 
ish we have undertaken the development of a parser for 
Turkish using the lexical-functional grammar formalism, 
for use in a number of applications. This work represents 
the first approach to the computational analysis of Turk¬ 
ish, though there have been a number of studies of Turkish 
syntax from a linguistic perspective (e.g., [Meskill 1970]). 
Our implementation is based on Tomita’s parser developed 
at Carnegie-Mellon University Center for Machine Transla¬ 
tion [Musha et.al. 1988, Tomita 1987], Our grammar cov¬ 
ers a substantial subset of Turkish including simple and 
complex sentences, and deals with a reasonable amount of 
word order freeness. 

Turkish has two characteristics that have to be taken into 
account: agglutinative morphology, and rather free word 
orderwithexplicitca.se marking. We handle the rather com¬ 
plex agglutinative morphology of the Turkish lexical struc¬ 
tures using a separate morphological processor based on 
the two-level paradigm [Evans 1990, Ollazer 1993] that we 
have integrated with the lexical-functional grammar parser. 
Word order freeness is dealt with by relaxing the order of 
phrases in the phrase structure parts of lexical-functional 
grammar rule by means of generalized phrases. 


'This work was done as a part of die first author’s M.Sc. degree work 
at the Department of Computer Engineering of Bilkent University, Ankara, 
06533 Turkey. 


2 LEXICAL-FUNCTIONAL GRAMMAR 

Lexical-functional grammar (LEG) is a linguistic theory 
which fits nicely into computational approaches that use 
unification [Shieber 1986], A lexical-functional grammar 
assigns two levels of syntactic description to every sen¬ 
tence of a language: a constituent structure and afunctional 
structure. Constituent structures (c-structures) characterize 
the phrase structure configurations as a conventional phrase 
structure tree, while surface grammatical functions such as 
subject, object , and adjuncts are represented in functional 
structure (f-structure). Because of space limitations we will 
not go into the details of the theory. One can refer to Kaplan 
and Bresnan [Kaplan and Bresnan 1982] for a thorough dis¬ 
cussion of the LFG formalism. 

3 TURKISH SYNTAX 

In this section, we would like to highlight two of the rele¬ 
vant key issues in Turkish grammar, namely highly inflected 
agglutinative morphology and free word order, and give a 
description of the structural classification of Turkish sen¬ 
tences that we deal with. 

3.1 Morphology 

Turkish is an agglutinative language with word structures 
formed by productive affixations of derivational and inflec¬ 
tional suffixes to root words [Oflazer 1993], This extensive 
use of suffixes causes morphological parsing of words to be 
l ather complicated, and results in ambiguous lexical inter¬ 
pretations in many cases. For example: 

(1) yocuklan 

a. child+PLU+3SG-POSS his children 

b. child+3PL-POSS their child 

c. child+PLU+3PL-POSS their children 

d. child+PLU+ACC children (acc.) 

Such ambiguity can sometimes be resolved at phrase 
and sentence levels by the help of agreement requirements 
though this is not always possible: 

(2a) Onlartn yocuklan geldiler. 

it+PLU+GEN child+Pl.U come+PAST 

+3PL-POSS +3PL 
(Their children came.) 


494 



Table 1: Percentage of different word orders in Turkish. 


Sentence 

Children 

Adult 

Type 

Speech 

Speech 

SOV 

46% 

4K% 

OSV 

7% 

X% 

SVO 

17% 

25% 

OVS 

20% 

13% 

VSO 

10% 

6% 

VOS 

0% 

0% 


(2b) (,'otuklari gcldiler. 

child+PLU+3SG-POSS come+PAST+3PL 
(His children came.) 

child+PLU+3PL-POSS comc+PAST+3PL 
(Their children came.) 

For example, in (2a) only the interpretation (1c) (i.c., their 
children) is possible because: 

• the agreement requirement between the modifier and 
the modified parts in a possessive compound noun 
eliminates (la). 2 

• the facts that gel (come) does not subcategori/e for 
an accusative marked direct object, and that in Turkish 
the subject of a sentence must be nominative 3 eliminate 
(Id). 

• the agreement requirement between the subject and the 
verb of a sentence eliminates (lb). 4 * 

In (2b), both (la) and (lc) are possible (his children, and 
their children, respectively) because the modifier of the pos¬ 
sessive compound noun is a covert one: it may be either 
onun (his) or onlann (their). The other two interpretations 
are eliminated due to the same reasons as in (2a). 

3.2 Word Order 

If we concern ourselves with the typical order of con¬ 
stituents, Turkish can be characterized as being a subject 
object-verb (SOV) language, though the data in Table I 
from Itrguvanh [Erguvanlt 1979], shows that other orders 
for constituents are also common (especially in discourse). 
In Turkish it is not the position, but the case of a noun 
phrase that determines its grammatical function in the sen¬ 
tence. Consequently typical order of the constituents may 
change rather freely without affecting the granunaticality 
of a sentence. Due to various syntactic and pragmatic 
constraints, sentences with the non-typical orders are not 

2 The agreement of the modifier must be the same as the possessive of 
(lie modified will) the exception (lrat if [fie modifier is third person plural 
the possessive of the modified may he third person singular. 

'Ill Turkish, the nominative case is unmarked. 

4 ln a Turkish sentence, person features of the subject and the verb 

should he the same. This is true also for the number features with one 
exception: third person plural subjects may sometimes take third person 

singular verbs. 


stylistic variants of the typical versions which can be used 
interchangeably in any context flirguvanlt 1979], For ex¬ 
ample, a constituent (bat is lo be emphasized is generally 
placed immediately before the verb. This affects the places 
of all the constituents in a sentence except that of the verb: 

(3a) lieu yocajja kitabi verdim. 

1 child+DAT hook+ACC give+PAST 

■id SC. 

(1 gave the book to the child.) 

(3b) Coeiiga kitabi ben verdim. 

child+DAT book+ACC 1 give+PAST 

11 SO 

(If was me who gave the child the book.) 

(3c) lien kitabi goettga verdim. 

I book+ACC child+DAT give+PAST 

+ISC- 

fit was the child lo whom I gave the book.) 

(3a) is an example of the typical word order whereas in 
(3b) the subject, ben, is emphasized. Similarly, in (3c) the 
indirect object, focuga, is emphasized. 

In addition to these possible changes, the verb itself may 
move away from its typical place, i.c., the end of the sen¬ 
tence. Such sentences are called inverted sentences and are 
typically used in informal prose and discourse. 

However, Ibis looseness of ordering constraints at sen 
tcncc level does not extend into all syntactic levels. There 
are even constraints at sentence level: 

• A nominative direct object should be placed immediately 
before the verb. 3 Hence, (5b) is ungrammatical: 6 

(5a) lien (,'ocuga kitnp verdim. 

I child+DAT book give+PAST+ISCi 

(I gave a book lo the child.) 

(5b) *( 1 Ioeug:i kitap ben verdim. 

child+DAT book I givc+PAST+1 SO 

• Some adverbial complements of quality (those that arc 
actually qualitative adjectives) always precede the verb or, 
if it exists, the indefinite direct object: 

(fin) Yeinegi iyi pi-jirdin. 

meal+ACC good cook-i-PAST-+2SG 

(You cooked the meal well.) 

(fib) iyi yeinegi pitprdin. 

good meal+ACC eookl PAST+2SG 
(You cooked the good meal.) 

(fie) Iyi yemek pisirdin. 

good meal cook+PAST+2SC! 

(You cooked a good meal ./You cooked a meal well.) 

Note that although (fib) is grammatical iyi is no more an 
adverbial complement, but is an adjective that modifies 
yeinegi. Note also that (fic) is ambiguous: iyi can be in¬ 
terpreted either as an adjective modifying yemek or as an 

■’ll- Turkish, a transitive verb that suhealegorizes for a direct object, can 
take either an accusative marked or a nominative marked (unmarked on 
the surface) noun phrase for that object. The function of accusative case 
markmp, is to indicate that the object refers to a particular definite entity, 
though there are very rare cases where this is not the case. 

"Note (tint (Itb.e) am grammatical since the direct object, kitabi, is 
marked accusative. 


495 




adverb modifying pifirdin? 

3.3 Structural Classification of Sentences 

The following summarizes the major classes of sentences in 
Turkish. 

•Simple Sentences: A simple sentence contains only one 
independent judgement. The sentences in (2), (3), (4a), 
(5a), and (6) are all examples of simple sentences. 

• Complex Sentences : In Turkish, a sentence can be trans¬ 
formed into a construction with a verbal noun, a participle 
or a gerund by affixing certain suffixes to the verb of the 
sentence. Complex sentences are those that include such 
dependent (subordinate) clauses as their constituents, or as 
modifiers of their constituents. Dependent clauses may 
themselves contain other dependent clauses. So, we may 
have embedded structures such as: 


Burada 

ifilcbilecek 

su 

here+LOC 

drink+PASS+POT 

water 


+FUT+PART 


Imlamayacagimi 

zannetmek 

dogru 

find+NEG-POT 

think+INF 

right 


+FUT+PART 

+ 1SG-POSS 

+ACC 

olmazdi. 

be+NEG+AOR 

+PAST+3SG 

(It wouldn’t have been right for me to think that I wouldn’t 
be able to find drinkable water here.) 

The subject of (7) ( burada igilebilecek su bulamay- 
acagimt z annetmek - to think that I wouldn 't be able to find 
drinkable water here) is a nominal dependent clause whose 
definite object ( burada ifilebilecek su bulamayacagmu - 
that I wouldn't be able to find drinkable water here) is an 
adjectival dependent clause which acts as a nominal one. 
The indefinite object of this definite object (ifilebilecek su 
- drinkable water) is a compound noun whose modifier 
part is another adjectival dependent clause ( [i^ilebilecek - 
drinkable), and modified part is a noun (su - water). 

It should be noted that there are other types of sentences 
in the classification according to structure. However, we 
will not be concerned with them here because of space 
limitations. (See §im§ek [§imsck 1987], and Giingordii 
[Giingordu 1993] for details.) 

4 SYSTEM ARCHITECTURE AND IM¬ 
PLEMENTATION 

We have implemented our parser in the grammar develop¬ 
ment environment of the Generalized LR Parser/Compiler 
developed at Carnegie Mellon University Center for Ma¬ 
chine Translation, No attempt has been made to include 

7 The second interpretation is possible since yemek is an indefinite direct 
object. 


Input Sentence f-structure(s) 



Figure 1: The system architecture. 


morphological rules as the parser lets us incorporate our 
own morphological analyzer for which we use a full scale 
two-level specification of Turkish morphology based on a 
lexicon of about 24,000 root wordsfOllazer 1993], This 
lexicon is mainly used for morphological analysis and has 
limited additional syntactic and semantic information, and 
is augmented with an argument structure database. 8 

Figure 1 shows the architecture of our system. When 
a sentence is given as input to the program, the program 
first calls the morphological analyzer for each word in the 
sentence, and keeps the results of these calls in a list to 
be used later by the parser.' 1 If the morphological analyzer 
fails to return a structure for a word for any reason (e.g., 
the lexicon may lack the word or the word may be mis¬ 
spelled), the program returns with an error message. After 
the morphological analysis is completed, the parser is in¬ 
voked to check whether the sentence is grammatical. The 
parser performs bottom-up parsing. During this analysis, 
whenever it consumes a new word from the sentence, it 
picks up the morphological structure of this word from the 
list. If the word is a finite verb or an infinitival, the parser is 
also provided with the subcategorization frame of the word. 
At the end of the analysis, if the sentence is grammatical, 
its f-structure is output by the parser. 


K The morphological analyzer returns a list of feature-value pairs, for 
instance for the word ev<lekilerin (of those (things) in the housc/yotir things 
in the house) it will return 

1. ((*CAT* N)(*R* “ev")(*CASE* LOC)(*CONV* ADJ 
"ki“)(*AGR* 3PL)(*CASE* GUN)) 

2. !(*CAT* N)(*R* “ev") (*CASE* UOCM'CONV* ADJ 
"k.i.“) (*AGR* 3PL) (*POSS* 2SG) ) 

’Recall that there may tie a number of morphologically ambiguous 
interpretations of a word. In such a case, the morphological analyzer 
returns till of the possible morphological structures in a list, and the parser 
takes care of the ambiguity regarding the grammar rules. 


496 







Table 2: The number of rules for each category in the gram¬ 
mar. 


Category 

Number of Rules 

Noun phrases 

17 

Adjectival phrases 

10 

Postpositional phrases 

24 

Adverbial constructs 

50 

Verb phrases 

21 

Dependent clauses 

14 

Sentences 

6 

Lexical look up rules 

11 

TOTAL 

153 


5 THE GRAMMAR 

In this section, wc present an overview of the LFG spec¬ 
ification that we have developed for Turkish syntax. Our 
grammar includes rules for sentences, dependent clauses, 
noun phrases, adjectival phrases, postpositional phrases, 
adverbial constructs, verb phrases, and a number of lexi¬ 
cal look up rules. 10 Table 2 presents the number of rules 
for each category in the grammar. There arc also some 
intermediary rules, not shown here. 

Recall that the typical order of constituents in a sentence 
may change due to a number of reasons. Since the order of 
phrases is fixed in the phrase structure component of an LFG 
rule, this rather free nature of word order in sentence level 
constitutes a major problem. In order to keep from using a 
number of redundant rules we adopt the following strategy 
in our rules: We use the same place holder, <XP>, for all 
the syntactic categories in the phrase structure component 
of a sentence or a dependent clause rule, and check the 
categories of these phrases in the equations part of the rule. 
In Figure 2, we give a grammar rule for the sentence with two 
constituents, with an informal description of the equation 
part. 11 

Recall also that an indefinite object should be placed im¬ 
mediately before the verb, and some adverbial complements 
of quality (those that are actually qualitative adjectives) al¬ 
ways precede the verb or, if it exists, the indefinite direct 
object. In our grammar, we treat such objects and adverbial 
complements as parts of the verb phrase. So, we do not 
check these constraints at the sentence or dependent clause 
level. 

6 PERFORMANCE EVALUATION 

In this section, we present some results about the perfor¬ 
mance of our system on test runs with four different texts on 
different topics. All of the texts are articles taken from mag¬ 
azines. We used the CMU Common Lisp system running 

"’Recall that no morphological rules are included. The lexical look up 
rules are used just to call the morphological analyzer. 

"Note that xO, xl, and x2 refer to the functional structures of the sen¬ 
tence, the first constituent and the second constituent in the phrase structure, 
respectively. 


(<S> <==> (<XP> <XP>) 

1) if xi's category is VP then 

assign xl to the functional structure 
of the verb of the sentence 
if x2's category is VP then 
assign x2 to the functional structure 
of the verb of the sentence 

2) for 1 = 1 to 2 do 

if xi has already been assigned to 
the verb then do nothing 

if xi's category is ADVP then 
add xi to the adverbial complements 
of the sentence 

if xi's category is NP and 
xi's case is nominative then 
assign xi to the functional struct¬ 
ure of the subject of the sentence 

if xi's category is NP then 
if the verb of the sentence can take 
an object with this case (consider 
also the voice of the verb) 
add xi to the objects of the verb 

3) check if the verb has taken all the 
objects that it has to take 

4) make sure that the verb has not 
taken more than one object with 
the same thematic role 

5) check if the subject and the verb 
agree in number and person: 

if the subject is defined (overt) 
then 

if the agreement feature of the 
subject is third person plural 
then the agreement feature of the 
verb may bo either third person 
singular or third person plural 
else 

the agreement features of the 
subject and the verb must be 
the same 

else if the subject is undefined 
(covert) then assign the 
agreement feature of the verb 
to that of the subject 

Figure 2: An LFG rule for the sentence level given with an 
informal description of the equation part. 


497 




Table 3: Statistical information about the test runs. 


Doc 

#S 

#S 

in 

Scope 

# S 
ign. 

#S 

alter 

Pre-ed. 

#P 

per 

Sent. 

Secs 

per 

Sent. 

1 

43 

30 

0 

55 

4.28 

12.26 

2 

51 

41 

2 

62 

5.02 

8.92 

3 

56 

48 

1 

64 

4.87 

10.28 

4 

80 

70 

0 

97 

3.25 

7.46 

Tot. 

230 

189 

3 

279 

- 

- 


100% 

82% 






#S: Number of sentences, #P: Number of parses. 


in a Unix environment, on SUN Sparcstations at Center for 
Cognitive Science, University of Edinburgh. 12 

In all of the texts there were some sentences outside our 
scope. These were: 


ambiguous interpretations for this sentence as indicated in 
(8b-e): 15 


(8a) Kuyiik kirmi/i top 

little red ball 

red paint+ 
3SG-POSS 


gittik^e 

go+GER 

gradually 


luzlandi. 

speed up 

+PAST 

+3SG 


(8b) The little red ball gradually sped up. 

(8c) The little red (one) sped up as the ball went. 
(8d) The little (one) sped up as the red ball went. 
(8e) It sped up as the little red ball went. 


The output of the parser for the first interpretation is 
given in Figure 3. This output indicates that the subject of 
the sentence is a noun phrase whose modifier part is kiifiik, 
and modified part is another noun phrase whose modifier 
part is kinmzi and modified part is top. The agreement 
of the subject is third person singular, case is nominative, 
etc. HizUituh is the verb of the sentence, and its voice is 
active, tense is past, agreement is third person singular, etc. 
Gittikfe is a temporal adverbial complement. 


• sentences that contain finite sentences as their con¬ 
stituents or modifiers of their constituents, 


Figures 4 through 7 illustrate the c-structures of the four 
ambiguous interpretations (8b-e), respectively: 16 


• conditional sentences, 

• finite sentences that are connected by coordinators 
(and/or), and 

• sentences with discontinuous constituents. n 


• In (8b), the adjective ktrmizi modifies the noun top, 
and this noun phrase is then modified by the adjective 
kiifiik. The entire noun phrase functions as the sub¬ 
ject of the main verb Inzlotult, and the gerund gittikfe 
functions as an adverbial adjunct of the main verb. 


We pre-edited the texts so that the sentences were in 
our scope (e.g., separated finite sentences connected by co¬ 
ordinators and parsed them as independent sentences, and 
ignored the conditional sentences). Table 3 presents some 
statistical information about the test runs. The first, sec¬ 
ond and third columns show the document number, the total 
number of sentences and the number of sentences that we 
could parse without pre-editing, respectively. The other 
columns show the number of sentences that we totally ig¬ 
nored, the number of sentences in the pre-edited versions of 
the documents, average number of parses per sentence gen¬ 
erated and average CPU time for each of the sentences in the 
texts, respectively. It can be seen that our grammar can suc¬ 
cessfully deal with about 82% of the sentences that we have 
experimented with, with almost all remaining sentences be¬ 
coming parsable after a minor pre-editing. This indicates 
that our grammar coverage is reasonably satisfactory. 

Below, we present the output for a sentence which shows 
very nicely where the structural ambiguity comes out in 
Turkish. 14 The output for (8a) indicates that there are four 

12 We should however note that the times reported are exclusive of 
tiie time taken by the morphological processor, which with a 24,000 
word root lexicon is rather slow and can process about 2-3 lexical 
forms per second. We have, however, ported our morphological ana¬ 
lyzer to the XEROX TWOL system developed by Kantuncn and Uecsley 
[Karttunen and Beesley 1992] and this system can process about 500 forms 
a second. We intend to integrate this to our system soon. 

13 Word order freeness in Turkish allows various kinds of discontinuous 
constituents, e.g., an adverbial adjunct cutting in the middle of a compound 
noun. 

l4 This example is not in any of the texts mentioned above. It is taken 
from the first author’s thesis [Giingordii 1993]. 


• In (8c), the adjective ktrmizi is used as a noun, and is 
modified by the adjective kiifiik.' 1 This noun phrase 
functions as the subject of the main verb. The noun 
top functions as the subject of the gerund gittikfe, and 
this non-linite clause functions as an adverbial adjunct 
of the main verb. 

• In (Rd), the adjective kiifiik is used as a noun, and 
functions as the subject of the main verb. The noun 
phrase ktrmizi top functions as the subject of the gerund 
gittikfe, and this non-linite clause functions as an ad¬ 
verbial adjunct of the main verb. 

• In (8e), the noun phrase kiifiik ktrmizi top functions 
as the subject of the gerund gittikfe (cf. (8b) where 
it functions as the subject of the main verb), and this 
non-linite clause functions as an adverbial adjunct of 
the main verb. Note that the subject of the main verb 
in this interpretation (i.e., it) is a covert one. Hence, it 
does not appear in the c-structure shown in Figure 7. 


15 In I'ael, (his .sentence has a fifth inlet pretilliun due to the lexical ambi¬ 
guity of the second word. In 3'urkish. ktrmtz is the name of a shining, red 
paint obtained from an insect with the same name. So, (8a) also means 'His 
little ml paint sped up ns the hall went.' However, this is very unlikely to 
collie to mind even for native speakers. 

i( Thc c-structures given here are simplified by removing some nodes 
introduced by certain intermediary rules to increase readability. 

17 In 'furkish, any adjective can be used as a noun. 


498 




;**** ambiguity 1 *** 

( (SUBJ 

((*AGR* 3SG) (*CASE* NOM) 


(*AGR* 3SG) 

!*LEX* "top") 

(*R* "top"))) 
(*AGK* 3SG) 

(‘CASE* NOM) 

(‘LEX* "top") 

(*DEF* -))) 

(MODIFIER 

((‘SUB* QUAL) (‘CASE* NOM) 
(*AGR* 3SG) 

(‘LEX* "kUCUk”))) 

) 

) 

(VERB 

((‘TYPE* VERBAL) (‘VOICE* ACT) 

(*LEX* "hlzlandl") 

(*CAT* V) 

(*H* “hlzlan") 
(‘ASPECT* PAST) 

(*AGR* 3SG))) 

(ADVCOMPLEMENTS 

( (*SUB* TEMP) (*LEX* “gittikCe") 
(*CAT* ADVP) 

(*CONV* 

( (‘WITH-SUFFIX* "dikce") 
(*CAT* V) 

(*R* -git")))))) 


(*DEF* -) 

(*CAT* NP) 

(MODIFIED 

((*CAT* NP) 


/-I 



(MODIFIER 

NP 

ADVP 

VP 

((‘CASE* NOM) (*AGR* 3SG) 


NP 

1 

GF,R 

1 

1 

(*LEX* "klrmlzl") 

ADJ N 

V 

(‘CAT* ADJ) 

I 1 

1 

N 

1 

top 

1 

eittik^e 

1 

(*R* "klrmlzl"))) 
(MODIFIED 

((‘CAT* N) (‘CASE* NOM) 

kiifiik kmnizi 

liiz.leindi 


figure 5: C-structure for (Sc), 



N 

I 


NP ADVP 

NP^^GI'R 
kii i' iik ADJ N qittikce 

I I 

kininzi top 

Figure 6: C-structure for (8tl). 


V 

I 

luzlanili 


Figure 3: Output of the parser for the first the ambiguous 
interpretation of (8a) (i.e., (8b)). 


S 




NP 

ADVP 

1 

VP 

1 

ADJ 

^NP 

1 

GHR 

1 

1 

V 

1 

1 

kii(nik 

AdT^N 

I 1 

i 

l>ittikfe 

1 

hizlandi 


1 1 
kmnizi top 




S 



ADVP VP 



kiifiilc ADJ N 


kmmz t top 


Figure 7: C-structure for (8e). 


Figure 4: C-structure for (8b). 


499 



7 CONCLUSIONS AND SUGGESTIONS 

We have presented a summary and highlights of our cur¬ 
rent work on providing an LFG specification for Turkish 
syntax. To the best of our knowledge this is the first such 
effort for constructing a computational grammar for Turk¬ 
ish. Our domain includes structurally simple and complex 
Turkish sentences. The rather complex morphological anal¬ 
yses of agglutinative words structures of Turkish are han¬ 
dled by a full-scale two-level morphological specification 
implemented in PC-KIMMO. 

We have number of directions for improving our grammar 
and parser: 

« Turkish is very rich in terms of adverbial constructs. 
We handle a great deal of these constructs by using a 
large number of rules. We are now in the process of 
developing a tagger with a multi-word construct rec¬ 
ognizer to preprocess the text so that many multi-word 
and idiomatic constructs can be handled outside the 
grammar. In this way, multi-word constructs such as 
yapar yapmaz (do+AOR+3SG do+NECi+AOR+3SG) 
(as soon as (one) does (that)) where both lexical cat¬ 
egories are verbal but the compound construct is an 
adverb, can be handled, so can idiomatic constructs 
like yam stra (side+3SG-POSS row) ( besides ) where 
the function and semantics of the multi-word construct 
has nothing to do with the function and semantics of 
the constituent lexical forms. 

• We arc currently working on extending the subset of 
sentences dealt with in respect of structure. 

• We are currently working on augmenting our lexicon 
with substantial lexical information and selectional re¬ 
striction information to be used with an integrated on¬ 
tological database. 

8 ACKNOWLEDGEMENTS 

We would like to thank Carnegie-Mellon University, Cen¬ 
ter for Machine Translation for making available to us their 
LFG parsing system. We would also like to thank Elisa- 
bet Engdahl and Matt Crocker of the Centre for Cognitive 
Science, University of Edinburgh, for providing valuable 
comments and suggestions. This work was done as a part 
of a large scale NLP project (TU-LANGUAGE) which is 
funded by a NATO Grant under the Science for Stability 
Program. 

References 

[Antworth 1990] E. L. Antworth, PC-KIMMO: A Two- 
level Processor for Morphological Analysis. 
Summer Institute of Linguistics, 1990, 

[Erguvanh 1979] E. E. Erguvanlt,. The Function of Word 
Order in Turkish Grammar. PhD thesis, De¬ 
partment of Linguistics, University of Califor¬ 
nia, Los Angeles, 1979. 


[Giingordii 1993] Z. Gungbrdii, “A lexical-functional 
grammar for Turkish,” M.Sc. thesis, De¬ 
partment of Computer Engineering and Infor¬ 
mation Sciences, Rilkent University, Ankara, 
Turkey, July 1993. 

[Kaplan and Brcsnan 1982] R. Kaplan and J. Bresnan, The 
Mental Representation of Grammatical Rela¬ 
tions, chapter Lexical-Functional Grammar: 
A Formal System for Grammatical Represen¬ 
tation, pp. 173-281. MIT Press, 1982. 

[Karttunen and Beesley 1992] L. Karltunen and K. R. 

Beesley,. “Two-level rule compiler,”. Techni¬ 
cal Report, XEROX Palo Alto Research Cen¬ 
ter, 1992. 

fMeskill 1970] R. H. Meskill, A Transformational Analysis 
of Turkish Syntax. Mouton, The Hague, Paris, 
1970. 

IMushact.nl. 1988] H. Musha T. Mitamura and M. Kee,. 

The Generalized LR Parser/Compiler Version 
S.l: User’s Guide. Carnegie-Mellon Univer¬ 
sity - Center for Machine Translation, April 
1988. 

[Oflazer 1993] K. Ollazer, “Two-level description of Turk¬ 
ish morphology,” in Proceedings of the Sixth 
Conference of the European Chapter of the 
Association for Computational Linguistics, 
April 1993. A full version is to appear in Lit¬ 
erary and Linguistic Computing, Vol.9 No.2, 
1994. 

[Shiebcr 1986] S.M. Shicber, An Introduction to 
Unification-Rased Approaches to Grammar. 
CSLI-Lecture Notes 4, 1986. 

I-Simsek I987| R. §im$ek, Orneklerle Tiirkfe Sdzdizimi 
(Turkish Syntax with Examples). Kuzey Mat- 
baacdik, 1987. 

ITomita 1987] M. Tomita, “An efficient augmented- 
eontext-lree parsing algorithm,” Computa¬ 
tional Linguistics, vol. 13, 1-2, pp. 31-46, 
January-June 1987. 


500 



